Cross-document relationship classification for text summarization

نویسندگان

  • Dragomir R. Radev
  • Zhu Zhang
  • Jahna Otterbacher
چکیده

Multiple documents describing the same event present some interesting challenges for natural language processing. They contain similar information and yet they also exhibit a number of interesting properties: paraphrases, partial agreement, difference in judgment and emphasis, and contradictions. When the sources track an event that evolves over time, more phenomena can be observed: additions, updates, corrections, comments, clarifications, etc. In the current work, we introduce CST (cross-document structure theory), a paradigm for multi-document analysis. CST takes into account the rhetorical structure of clusters of related textual documents. The goals of the paper are fourfold: first, we present a taxonomy of 18 CST relationships and give our motivation for proposing the theory. Second, we present an experiment that demonstrates that CST is essential for extractive, multi-document summarization. We then argue that CST can be the basis for multi-document summarization guided by user preferences for summary length, information provenance, cross-source agreement, and chronological ordering of facts. In particular, we propose a CST-enhancement procedure for extractive summarization, and demonstrate that CST-enhanced summaries outperform their unmodified counterparts as measured by the relative utility evaluation metric. Next we discuss the building of the CSTBank, a corpus of document clusters in which sentence pairs have been annotated for CST relationships. Finally, we discuss our efforts in the automatic detection of CST relationships between sentence pairs extracted from related documents. We show that a binary classifier for determining existence of structural relationships significantly outperforms the baseline. We also achieve promising results on the multi-class case in which the full taxonomy of relationships are considered.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A survey on Automatic Text Summarization

Text summarization endeavors to produce a summary version of a text, while maintaining the original ideas. The textual content on the web, in particular, is growing at an exponential rate. The ability to decipher through such massive amount of data, in order to extract the useful information, is a major undertaking and requires an automatic mechanism to aid with the extant repository of informa...

متن کامل

EXTRACTION-BASED TEXT SUMMARIZATION USING FUZZY ANALYSIS

Due to the explosive growth of the world-wide web, automatictext summarization has become an essential tool for web users. In this paperwe present a novel approach for creating text summaries. Using fuzzy logicand word-net, our model extracts the most relevant sentences from an originaldocument. The approach utilizes fuzzy measures and inference on theextracted textual information from the docu...

متن کامل

Systematic literature review of fuzzy logic based text summarization

Information Overloadrq  is not a new term but with the massive development in technology which enables anytime, anywhere, easy and unlimited access; participation & publishing of information has consequently escalated its impact. Assisting userslq    informational searches with reduced reading surfing time by extracting and evaluating accurate, authentic & relevant information are the primary c...

متن کامل

Text Summarization Using Cuckoo Search Optimization Algorithm

Today, with rapid growth of the World Wide Web and creation of Internet sites and online text resources, text summarization issue is highly attended by various researchers. Extractive-based text summarization is an important summarization method which is included of selecting the top representative sentences from the input document. When, we are facing into large data volume documents, the extr...

متن کامل

Improving Multi-Document Summarization via Text Classification

Developed so far, multi-document summarization has reached its bottleneck due to the lack of sufficient training data and diverse categories of documents. Text classification just makes up for these deficiencies. In this paper, we propose a novel summarization system called TCSum, which leverages plentiful text classification data to improve the performance of multi-document summarization. TCSu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004